Reduced Storage of Metadata in a Distributed Encoded Storage System

ABSTRACT

A data object can be encoded into a plurality of encoded data fragments and stored on backend storage elements in a distributed encoded storage system. The identifiers and metadata corresponding to each encoded fragment of the data object can be stored in a single metadata unit, which is stored on the backend as encoded fragments. The identifiers of the metadata fragments can be associated with the data object and stored on a low latency frontend storage device. Thus, the amount of metadata per data object stored on expensive low latency frontend storage is reduced to the fragment identifiers. The fragment identifiers can be quickly retrieved, and used to retrieve the identifiers and metadata corresponding to the encoded data fragments from the backend, for retrieval of the data object itself.

CROSS-REFERENCES TO RELATED APPLICATION

This application is related to the U.S. patent application filed withthe U.S. Patent and Trademark Office under Attorney Docket No.WDA-3570-US on Apr. 24, 2018, having the same assignee and entitled FastRead Operation Utilizing Reduced Storage of Metadata in a DistributedEncoded Storage System, the entire contents of which are herebyincorporated herein by reference.

TECHNICAL FIELD

The present disclosure pertains generally to storage systems, and morespecifically to reduced storage of metadata in a distributed encodedstorage system.

BACKGROUND

The rise in electronic and digital device technology has rapidly changedthe way society communicates, interacts, and consumes goods andservices. Modern computing devices allow organizations and users to haveaccess to a variety of useful applications in many locations. Using suchapplications results in the generation of a large amount of data.Storing and retrieving the produced data is a significant challengeassociated with providing useful applications and devices.

The data generated by online services and other applications can bestored at data storage facilities. As the amount of data grows, having aplurality of users sending and requesting data can result incomplications that reduce efficiency and speed. Quick and reliableaccess in storage systems is important for good performance.

Distributed encoded storage systems typically divide each data object tobe stored into a plurality of data pieces, each of which is encoded intoa plurality of encoded data fragments. The encoded data fragments arespread across multiple backend storage elements, thereby providing agiven level of redundancy. A distributed encoded storage systemmaintains metadata which identifies each stored data object, specifieswhere and how in the system each data object is stored, including wherethe encoded data fragments have been distributed and hence from wherethey can subsequently be retrieved, what type of encoding has been used,etc. For each encoded fragment of the data object, an identifier,location information and encoding information are maintained. Thus,storage of a single data object generates a large amount of associatedmetadata.

As noted above, a distributed encoded storage system stores the encodeddata fragments on storage elements in the backend. However, because thecorresponding metadata is accessed frequently and needs to be providedwith a high level of responsiveness, it is typically stored on storageelements other than those of the backend, as this would lead tounacceptable delays. Typically the backend storage elements on whichdata objects are stored are in the form of hard disks, while themetadata is stored on expensive, fast, low latency storage elements,such as solid state disks (“SSDs”). The separate storage of metadata onSSDs leads to the problem of a higher cost and typically a reduced levelof durability.

Additionally the metadata storage needs to be provided with a suitablelevel of redundancy. In order to provide for a sufficient level ofredundancy, the SSDs are often duplicated inside each datacenter, forexample by making use of a triple modular redundancy configuration withmajority vote logic to ensure redundancy against individual failures ofthe SSDs. In order to provide for a sufficient level of responsiveness,the metadata storage could also be duplicated in several geographicallydispersed datacenters of the distributed encoded storage system.Further, the stored metadata is typically made accessible by means ofhigh bandwidth connections and provided with high levels of processingpower to guarantee the desired responsiveness when processing clientrequests. This results in the usage of a great deal of expensive highresponsive storage elements, such as SSDs, expensive high bandwidthconnections and expensive processing power.

It would be desirable to address at least these issues.

SUMMARY

A data object can be encoded into a plurality of encoded data fragmentsand stored on backend storage elements in a distributed encoded storagesystem. The identifiers and metadata corresponding to each encodedfragment of the data object can be stored in a single metadata unit,which can be encoded into a plurality of metadata fragments anddistributed across the backend. The identifiers of the encoded metadatafragments can be associated with the data object and stored on a lowlatency frontend storage device. Thus, the amount of metadata per dataobject stored on expensive low latency frontend storage is reduced tothe metadata fragment identifiers. The fragment identifiers can bequickly retrieved, and used to retrieve the identifiers and metadatacorresponding to the encoded data fragments from the backend, forretrieval of the data object itself. Per stored data object, the size ofmetadata that is stored on expensive storage elements with a high levelof responsiveness is thus greatly reduced, without appreciablycompromising system level responsiveness. In addition, because themetadata itself is stored in encoded fragments on the backend, the levelof redundancy for the stored metadata can be the same as that for theencoded data fragments. Thus, the overall redundancy level of themetadata and scalability is also improved.

More specifically, the data object can be divided into a plurality ofdata pieces, and each data piece can be encoded into a plurality ofencoded data fragments. The plurality of encoded data fragments can betransmitted to the backend of the distributed encoded data storagesystem, for distribution across a plurality of backend storage elements,such that the distribution provides a specific level of redundancy. Acorresponding fragment identifier can be received for each one of theplurality of encoded data fragments, from the backend of the distributedencoded data storage system. (In another embodiment, the fragmentidentifiers can be generated or selected on the front end, and, e.g.,provided to the backend for use.) The corresponding fragment identifierand associated metadata for each one of the plurality of encoded datafragments can be stored in a single metadata unit. The single metadataunit can be encoded into a plurality of encoded metadata fragments,which can, but need not, be the same fragment format as the encodedfragments. Each one of the plurality of encoded metadata fragments canbe transmitted to the backend of the distributed encoded data storagesystem for distribution across the plurality of backend storageelements, wherein the distribution of the plurality of encoded metadatafragments across the plurality of backend storage elements provides thesame specific level of redundancy as the encoded data fragments, andhence the underlying data object. In response to transmitting each oneof the plurality of encoded metadata fragments, a corresponding fragmentidentifier can be received from the backend of the distributed encodeddata storage system (or self-generated/selected on the frontend). Thefragment identifiers corresponding to encoded metadata fragments can beassociated with the data object. These fragment identifiers can bestored on a low latency frontend storage element, from which they can bequickly retrieved and used to obtain the data object.

A system of one or more computers can be configured to performparticular operations or actions by virtue of having software, firmware,hardware, or a combination of these installed on the system, where thesoftware, firmware and/or hardware cause(s) the system to perform theactions. One or more computer programs can be configured to performparticular operations or actions by virtue of including instructionsthat, when executed by data processing apparatus, cause the apparatus toperform the actions.

One general aspect includes a computer-implemented method comprising:encoding a data object into a plurality of encoded data fragments;transmitting each one of the plurality of encoded data fragments to abackend of a distributed encoded data storage system, for distributionacross a plurality of backend storage elements, wherein the distributionof the plurality of encoded data fragments across the plurality ofbackend storage elements provides a specific level of redundancy; inresponse to transmitting each one of the plurality of encoded datafragments, receiving, for each one of the plurality of encoded datafragments, a corresponding fragment identifier from the backend of thedistributed encoded data storage system; for each one of the pluralityof encoded data fragments, storing the corresponding fragment identifierand associated metadata in a single metadata unit; encoding the singlemetadata unit into a plurality of encoded metadata fragments;transmitting each one of the plurality of encoded metadata fragments tothe backend of the distributed encoded data storage system fordistribution across the plurality of backend storage elements, whereinthe distribution of the plurality of encoded metadata fragments acrossthe plurality of backend storage elements provides the specific level ofredundancy; in response to transmitting each one of the plurality ofencoded metadata fragments, receiving, for each one of the plurality ofencoded metadata fragments, receiving a corresponding fragmentidentifier from the backend of the distributed encoded data storagesystem; associating the fragment identifiers corresponding to theencoded metadata fragments with the data object; and storing thefragment identifiers corresponding to the encoded metadata fragments ina frontend storage element, wherein the frontend storage elementprovides faster access to stored content than backend storage elements.

Another general aspect includes a computer system comprising: aprocessor; system memory; a plurality of electromechanical backendstorage elements; a solid state frontend storage element; instructionsin the system memory programmed to encode a data object into a pluralityof encoded data fragments; instructions in the system memory programmedto transmit each one of the plurality of encoded data fragments to abackend of a distributed encoded data storage system, for distributionacross the plurality of electromechanical backend storage elements,wherein the distribution of the plurality of encoded data fragmentsacross the plurality of electromechanical backend storage elementsprovides a specific level of redundancy; instructions in the systemmemory programmed to receive, for each one of the plurality of encodeddata fragments, a corresponding fragment identifier from the backend ofthe distributed encoded data storage system, in response to transmittingeach one of the plurality of encoded data fragments; instructions in thesystem memory programmed to store, for each one of the plurality ofencoded data fragments, the corresponding fragment identifier andassociated metadata in a single metadata unit; instructions in thesystem memory programmed to encode the single metadata unit into aplurality of encoded metadata fragments; instructions in the systemmemory programmed to transmit each one of the plurality of encodedmetadata fragments to the backend of the distributed encoded datastorage system for distribution across the plurality of backend storageelements, wherein the distribution of the plurality of encoded metadatafragments across the plurality of backend storage elements provides thespecific level of redundancy; instructions in the system memoryprogrammed to receive, in response to transmitting each one of theplurality of encoded metadata fragments, receiving, for each one of theplurality of encoded metadata fragments, a corresponding fragmentidentifier from the backend of the distributed encoded data storagesystem; instructions in the system memory programmed to associate thefragment identifiers corresponding to the encoded metadata fragmentswith the data object; and instructions in the system memory programmedto store the fragment identifiers corresponding to the encoded metadatafragments in the solid state frontend storage element, wherein the solidstate frontend storage element provides faster access to stored contentthan the electromechanical backend storage elements.

Another general aspect includes a computer system comprising: means forencoding a data object into a plurality of encoded data fragments; meansfor transmitting each one of the plurality of encoded data fragments toa backend of a distributed encoded data storage system, for distributionacross a plurality of backend storage elements, wherein the distributionof the plurality of encoded data fragments across the plurality ofbackend storage elements provides a specific level of redundancy; meansfor receiving, for each one of the plurality of encoded data fragments,a corresponding fragment identifier from the backend of the distributedencoded data storage system, in response to transmitting each one of theplurality of encoded data fragments; means for storing, for each one ofthe plurality of encoded data fragments, the corresponding fragmentidentifier and associated metadata in a single metadata unit; means forencoding the single metadata unit into a plurality of encoded metadatafragments; means for transmitting each one of the plurality of encodedmetadata fragment to the backend of the distributed encoded data storagesystem for distribution across the plurality of backend storageelements, wherein the distribution of the plurality of encoded metadatafragments across the plurality of backend storage elements provides thespecific level of redundancy; means for receiving for each one of theplurality of encoded metadata fragments, in response to transmittingeach one of the plurality of encoded metadata fragments, a correspondingfragment identifier from the backend of the distributed encoded datastorage system; means for associating the fragment identifierscorresponding to the encoded metadata fragments with the data object;and means for storing the fragment identifiers corresponding to theencoded metadata fragments in a frontend storage element, wherein thefrontend storage element provides faster access to stored content thanbackend storage elements.

Other embodiments of this aspect include corresponding computer systems,system means, apparatus, and computer programs recorded on one or morecomputer storage devices, each configured to perform the actions of themethod(s).

Some implementations may optionally include one or more of the followingfeatures: that providing a specific level of redundancy furthercomprises: providing a predetermined level of storage redundancyassociated with the distributed encoded data storage system; that thebackend storage elements further comprise electromechanical storagedevices; that the frontend storage element further comprises a solidstate storage device; that the frontend storage element furthercomprises flash memory; that the frontend storage element furthercomprises dynamic random access memory; that storing the correspondingfragment identifier and associated metadata, for each one of theplurality of encoded data fragments, in a single metadata unit furthercomprises: for each one of the plurality of encoded data fragments,storing the corresponding fragment identifier received from the backendof the distributed encoded data storage system in a temporary buffer;for each one of the plurality of encoded data fragments, storing theassociated metadata in the temporary buffer; and copying contents of thetemporary buffer into the single metadata unit; that storing thecorresponding fragment identifier and associated metadata, for each oneof the plurality of encoded data fragments, in a single metadata unitfurther comprises: storing fragment identifiers and associated metadatain a first section of the metadata unit; and storing a first portion ofcontents of the data object in a second section of the metadata unit;that metadata associated with each specific one of the plurality ofencoded data fragments further comprises, a storage location of thespecific encoded data fragment and encoding information concerning thespecific encoded data fragment; duplicating the fragment identifierscorresponding to the encoded metadata fragments across a plurality offrontend storage elements; transmitting a copy of content of thefrontend storage element to the backend of the distributed encoded datastorage system for storage, retrieving the stored copy from the backend,responsive to a loss of content of the frontend storage element, andstoring the retrieved copy on the frontend storage element; cachingcontent of at least one electromechanical backend storage element on thesolid state frontend storage element; and that encoding a data objectinto a plurality of encoded data fragments further comprises dividingthe data object into a plurality of data pieces, and encoding each oneof the data pieces into a plurality of encoded data fragments.

Note that the above list of features is not all-inclusive, and manyadditional features and advantages are contemplated and fall within thescope of the present disclosure. Moreover, the language used in thepresent disclosure has been principally selected for readability andinstructional purposes, and not to limit the scope of the subject matterdisclosed herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram of a distributed encoded storage system in which anencoded distributed metadata storage manager can operate, according toone embodiment.

FIG. 2 is a diagram illustrating the operation of an encoded distributedmetadata storage manager, according to one embodiment.

FIG. 3 is a flowchart illustrating the operation of an encodeddistributed metadata storage manager, according to one embodiment.

The Figures depict various embodiments for purposes of illustrationonly. One skilled in the art will readily recognize from the followingdiscussion that alternative embodiments of the structures and methodsillustrated herein may be employed without departing from the principlesdescribed herein.

DETAILED DESCRIPTION

The present disclosure describes technology, which may include methods,systems, apparatuses, computer program products, and otherimplementations, for reducing storage of metadata in a distributedencoded storage system. Rather than storing all of the metadatacorresponding to the encoded fragments of stored data objects onexpensive, low latency frontend storage devices (e.g., flash memory,DRAM, etc.), the identifiers and metadata corresponding to eachdistributed encoded fragment of a data object are stored in a metadataunit, which is stored on the backend as a plurality of encoded metadatafragments. The identifiers of the metadata fragments are associated withthe data object and stored on a low latency frontend storage device.Thus, the amount of metadata per data object stored on expensive lowlatency frontend storage is reduced to the metadata fragmentidentifiers. These fragment identifiers can be quickly retrieved, andused to retrieve the identifiers and metadata corresponding to theencoded data fragments from the backend, for retrieval of the dataobject itself. Thus, per data object, the size of metadata that isstored on expensive storage elements with a high level of responsivenessis greatly reduced, without appreciably compromising system levelresponsiveness. In addition, because the metadata itself is stored inencoded fragments on the backend, the level of redundancy for the storedmetadata can be the same as that for the encoded data fragments. Thus,the overall redundancy level of the metadata and scalability is alsoimproved.

FIG. 1 illustrates an exemplary datacenter 109 in a distributed encodedstorage system 100 in which an encoded distributed metadata storagemanager 101 can operate, according to one embodiment. In the illustrateddistributed encoded storage system 100, datacenter 109 comprises storageservers 105A, 105B and 105N, which are communicatively coupled via anetwork 107. An encoded distributed metadata storage manager 101 isillustrated as residing on storage server 105A. It is to be understoodthat the encoded distributed metadata storage manager 101 can reside onmore, fewer or different computing devices, and/or can be distributedbetween multiple computing devices, as desired. In FIG. 1, storageserver 105A is further depicted as having storage devices 160A(1)-(N)attached, storage server 105B is further depicted as having storagedevices 160B(1)-(N) attached, and storage server 105N is depicted withstorage devices 160N(1)-(N) attached. It is to be understood thatstorage devices 160A(1)-(N), 160B(1)-(N) and 160N(1)-(N) can beinstantiated as electromechanical storage such as hard disks, solidstate storage such as flash memory, other types of storage media, and/orcombinations of these.

Although three storage servers 105A-N each coupled to three devices160(1)-(N) are illustrated for visual clarity, it is to be understoodthat the storage servers 105A-N can be in the form of rack mountedcomputing devices, and datacenters 109 can comprise many large storageracks each housing a dozen or more storage servers 105, hundreds ofstorage devices 160 and a fast network 107. It is further to beunderstood that although FIG. 1 only illustrates a single datacenter109, a distributed encoded storage system can comprise multipledatacenters 109, including datacenters 109 located in different cities,countries and/or continents.

It is to be understood that although the embodiment described inconjunction with FIG. 2-3 is directed to object storage, in otherembodiments the encoded distributed metadata storage manager 101 canoperate in the context of other storage architectures. As an example ofanother possible storage architecture according to some embodiments,server 105A is depicted as also being connected to a SAN fabric 170which supports access to storage devices 180(1)-(N). Intelligent storagearray 190 is also shown as an example of a specific storage deviceaccessible via SAN fabric 170. As noted above, SAN 170 is shown in FIG.1 only as an example of another possible architecture to which theencoded distributed metadata storage manager 101 might be applied inanother embodiment. In yet other embodiments, shared storage can beimplemented using FC and iSCSI (not illustrated) instead of a SAN fabric170.

Turning to FIG. 2, in one example embodiment, the encoded distributedmetadata storage manager 101 reduce the size of the metadata 207 storedon low latency frontend storage element 209, and better matches thelevel of redundancy of the metadata 207 to the level of redundancy ofthe corresponding stored data. As shown in the FIG. 2, the encodeddistributed metadata storage manager 101 first divides a data object 201(for example of size 256 MB, although data objects 201 can be ofdifferent sizes in different embodiments) into a plurality of datapieces 202. FIG. 2 illustrates the data object 201 being divided intofour data pieces (202A, 202B, 202C and 202D) for clarity ofillustration. Typically, a data object 201 would be divided into (many)more data pieces 202 in practice. For example, each data piece 202 couldbe of size 1 MB, although data pieces 202 can be of different sizes indifferent embodiments.

The encoded distributed metadata storage manager 101 encodes each one ofthese data pieces 202 into a plurality of corresponding encoded datafragments 203, which are distributed across multiple backend storageelements 213 to provide a given level of redundancy. For example, FIG. 2illustrates each separate data piece 202A-D being encoded into threeseparate encoded data fragments, e.g., 203A1, 203A2 and 203A3, etc. Itis to be understood that three is just an example number of datafragments 203, and data pieces 202 can be divided into more (or fewer)data fragments 203 as desired. It is to be further understood that thespecific encoding format used to encode the data pieces 202 into encodeddata fragments 203 can vary between embodiments, as can the size of theencoded data fragments 203.

The encoded distributed metadata storage manager 101 then provides theseencoded data fragments 203 to the backend 211 of the distributed encodeddata storage system 100, where they are redundantly distributed acrossmultiple backend storage elements 213, thereby providing a specificlevel of redundancy (e.g., a predetermined level of storage redundancyassociated with the distributed encoded data storage system 100). Forexample, in an embodiment in which each data piece 202 is encoded intothree encoded data fragments 203, the data fragments can be distributedacross three separate backend storage elements 213. The specific levelof redundancy to provide is a variable design parameter.

It is to be understood that low latency frontend storage elements 209may be in the form of SSDs such as flash memory devices, whereas backendstorage elements 213 may be in the form of hard disks or other forms ofelectromechanical storage devices. Because solid state storage hassignificantly faster access times than the electromechanical storage,frontend storage elements 209 typically provide faster access to storedcontent than backend storage elements 213. In addition, solid statestorage is significantly more expensive per megabyte thanelectromechanical storage.

For each encoded data fragment 203 transmitted to the backend 211 forstorage, the backend 211 returns a corresponding fragment identifier205. As noted above, in other embodiments fragment identifiers can begenerated/selected on the frontend and provided to the backend for use.In addition, metadata 207 concerning each encoded data fragment 203stored on the backend 211 is generated, such as the storage location ofthe specific encoded data fragment 203 (e.g., information enabling thesubsequent retrieval of the encoded data fragment 203), encodinginformation concerning the specific encoded data fragment 203 (e.g., theencoding format used and any relevant parameters), relationshipinformation between the specific encoded data fragment 203 and itscorresponding data piece 202, relationship information between thespecific encoded data fragment 203 and other encoded data fragments 203and/or relationship information between the multiple data pieces 202 ofthe data object 201 (e.g., information enabling the reconstruction ofthe underlying data object 201 from the plurality of fragments 203),etc. It is to be understood that the specific format and/or content ofmetadata 207 concerning encoded data fragments 203 can vary betweenembodiments.

For each encoded data fragment 203, the encoded distributed metadatastorage manager 101 stores the corresponding fragment identifier 205 andassociated metadata 207 in a single metadata unit 214, which is thenencoded into a plurality of encoded metadata fragments, e.g., 215A, 215Band 215C. It is to be understood that the metadata unit 214 is analogousto a data piece 202. In addition, the metadata unit 214 can be the samesize (e.g., 1 MB) as the data pieces 202. In one embodiment, in order toconstruct the metadata unit 214, as fragment identifiers 205 arereturned from the backend 211, the fragment identifiers andcorresponding metadata 207 and stored in a temporary buffer. Thecontents of the temporary buffer is then copied into the single metadataunit 214. In one embodiment, the fragment identifiers 205 and associatedmetadata 207 are stored in a first section of the metadata unit 214, anda first portion of contents of the data object 201 is stored in a secondsection of the metadata unit 214. In this embodiment, when the dataobject 201 is being retrieved from the backend 211, the first section ofthe data object 201 can be quickly retrieved and provided to therequesting party as described in more detail below.

The encoded distributed metadata storage manager 101 transmits themultiple encoded metadata fragments 215 encoded from the metadata unit214 to the backend 211 of the distributed encoded data storage system100 for storage, as was done with the encoded data fragments 203. Inresponse to transmitting each encoded metadata fragment 215, acorresponding fragment identifier 205 is received from the backend 211.In other words, the backend 211 returns a fragment identifier 205 foreach encoded metadata fragment 215, as it did for the encoded fragments203. And like the encoded data fragments 203, the backend 211redundantly distributes the encoded metadata fragments 215 acrossmultiple backend storage elements 213, thereby providing the same levelof redundancy for the encoded metadata fragments 215 as for the encodeddata fragments 203 (e.g., the predetermined level of storage redundancyassociated with the distributed encoded data storage system 100).

The encoded distributed metadata storage manager 101 associates themetadata fragment identifiers 205 with the data object 201, and storesthe metadata fragment identifiers 205 on a frontend storage element 209,which as noted above provides faster access to stored content thanbackend storage elements 213. It is to be understood that the metadatafragment identifiers 205, which are just the fragment identifiers 205that identify the metadata fragments 215, are much smaller (for example,16 bytes) than the identifiers 205 and associated metadata 207 (e.g.,hundreds of kilobytes) corresponding to all of the encoded datafragments 203. Thus, the space consumed per data object 201 of the fast,low latency, expensive frontend storage elements 209 is reduced.

In one embodiment, low latency frontend storage elements 209 may be inthe form of media other than flash memory, such as dynamic random accessmemory (DRAM). DRAM is typically faster but more expensive than flashmemory. Unlike flash memory, DRAM is volatile, quickly losing its datawithout power. In one embodiment a copy of the contents of a frontendstorage element 209 (e.g., the metadata fragment identifier 205, whichis small in size) can be provided to the backend 211 for storage. Thisenables the contents of the frontend storage element 209 to be rebuiltfrom the copy stored on the backend 211, for example in the case wherethe contents of the frontend storage are lost due to, e.g., a lapse insupplied power to volatile storage elements such as DRAM.

Because the amount of frontend storage space utilized is minimal,content stored on the backend 211 (e.g., encoded data fragments 203) canbe cached to frontend storage in one embodiment, thereby improvingoverall performance. The amount of frontend storage space to utilize asa cache in this content is a variable design parameter.

Note that even where the metadata fragment identifiers 205 aredistributed across multiple frontend storage elements for redundancy(e.g., multiple frontend storage elements 209 per datacenter 109, oracross multiple datacenters 109), the total amount of low latencyfrontend storage space utilized is still orders of magnitude less thanin conventional systems in which all of the metadata 207 is stored onthe frontend 209.

In addition, the use of the encoded distributed metadata storage manager101 as described above has minimal impact on the operation of thedistributed storage backend 211, as the metadata fragments 215 can beprocessed by the backend 211 in the same or a similar way as any otherencoded fragments 203. Thus, the metadata fragments 215 can be stored onthe backend 211 in a way that provides the same level of redundancy asis provided for the data object 201.

Because the fragment identifiers 205 that identify the metadatafragments 215 are stored on fast, low latency frontend storage 209, theycan be retrieved quickly. Because the metadata fragment identifiers 205are linked to the data object 201, they can be used to retrieve the dataobject 201 from the backend 211. More specifically, metadata fragments215 identified by the metadata fragment identifiers 205 can first beretrieved from the backend 211, and used to recreate the single metadataunit 214. This metadata unit 214 contains the identifiers 205 andassociated metadata 207 for all of the encoded data fragments 203 of thedata object 201, which can thus be retrieved from the backend. As notedabove, in one embodiment the metadata unit 214 also stores an initialpart of the contents of the data object 201. As described in relatedapplication WDA-3570-US, the initial part of the contents of the dataobject 201 in the metadata unit 214 can be used to begin the fulfillmentof an access request, while the encoded data fragments 205 are beingretrieved.

Thus, the use of the encoded distributed metadata storage manager 101 asdescribed above ensures that the storage capacity per data object 201for metadata 207 on expensive low latency storage elements 209 with ahigh level of responsiveness is reduced, without compromising systemlevel responsiveness. The overall redundancy level of the metadata 207and scalability is also improved.

FIG. 3 is a flowchart illustrating steps that may be performed by theencoded distributed metadata storage manager 101, according to oneembodiment. The encoded distributed metadata storage manager 101 encodes301 a data object 201 into a plurality of encoded data fragments 203.The encoded distributed metadata storage manager 101 transmits 303 theplurality of encoded data fragments 203 to the backend 211 of thedistributed encoded data storage system 100, for distribution across aplurality of backend storage elements 213, such that the distributionprovides a specific level of redundancy. The encoded distributedmetadata storage manager 101 receives 305 a corresponding fragmentidentifier 205 for each one of the plurality of encoded data fragments203, from the backend 211 of the distributed encoded data storage system100. The encoded distributed metadata storage manager 101 stores 307 thecorresponding fragment identifier 205 and associated metadata 207 foreach one of the plurality of encoded data fragments 203 in a singlemetadata unit 214. The encoded distributed metadata storage manager 101encodes 309 the single metadata unit 214 into a plurality of encodedmetadata fragments 215. (Note that metadata fragments 215 may but neednot be of the same fragment format as the encoded data fragments 205).The encoded distributed metadata storage manager 101 transmits 311 eachone of the plurality of encoded metadata fragments 215 to the backend211 of the distributed encoded data storage system 100, for distributionacross the plurality of backend storage elements 213. (Note that thedistribution of the plurality of encoded metadata fragments 215 acrossthe plurality of backend storage elements 213 provides the same specificlevel of redundancy as the encoded data fragments 203, and hence theunderlying data object 201.) In response to transmitting each one of theplurality of encoded metadata fragments 215, the encoded distributedmetadata storage manager 101 receives 313 a corresponding fragmentidentifier 205 from the backend 211 of the distributed encoded datastorage system 100. The encoded distributed metadata storage manager 101associates 315 the fragment identifiers 205 corresponding to the encodedmetadata fragments 215 with the data object 201, and stores 317 thesefragment identifiers 205 on a low latency frontend storage element 209,from which they can be quickly retrieved and used to obtain the dataobject 201. As explained above, frontend storage elements 209 (e.g.,SSDs) can provide faster access to stored content than backend storageelements 213 (e.g., hard disks).

FIGS. 1-2 illustrate an encoded distributed metadata storage manager 101residing on a single storage server 105. It is to be understood thatthis is just an example. The functionalities of the encoded distributedmetadata storage manager 101 can be implemented on other computingdevices in other embodiments, or can be distributed between multiplecomputing devices. It is to be understood that although the encodeddistributed metadata storage manager 101 is illustrated in FIG. 1 as astandalone entity, the illustrated encoded distributed metadata storagemanager 101 represents a collection of functionalities, which can beinstantiated as a single or multiple modules on one or more computingdevices as desired.

It is to be understood the encoded distributed metadata storage manager101 can be instantiated as one or more modules (for example as objectcode or executable images) within the system memory (e.g., RAM, ROM,flash memory) of any computing device, such that when the processor ofthe computing device processes a module, the computing device executesthe associated functionality. As used herein, the terms “computersystem,” “computer,” “client,” “client computer,” “server,” “servercomputer” and “computing device” mean one or more computers configuredand/or programmed to execute the described functionality. Additionally,program code to implement the functionalities of the encoded distributedmetadata storage manager 101 can be stored on computer-readable storagemedia. Any form of tangible computer readable storage medium can be usedin this context, such as magnetic or optical storage media. As usedherein, the term “computer readable storage medium” does not mean anelectrical signal separate from an underlying physical medium.

Throughout this specification, plural instances may implementcomponents, operations, or structures described as a single instance.Although individual operations of one or more methods are illustratedand described as separate operations, one or more of the individualoperations may be performed concurrently, and nothing requires that theoperations be performed in the order illustrated. Structures andfunctionality presented as separate components in example configurationsmay be implemented as a combined structure or component. Similarly,structures and functionality presented as a single component may beimplemented as separate components. These and other variations,modifications, additions, and improvements fall within the scope of thesubject matter herein.

The embodiments illustrated herein are described in enough detail toenable the disclosed teachings to be practiced. Other embodiments may beused and derived therefrom, such that structural and logicalsubstitutions and changes may be made without departing from the scopeof this disclosure. The Detailed Description, therefore, is not to betaken in a limiting sense, and the scope of various embodiments isdefined by the below claims, along with the full range of equivalents towhich such claims are entitled.

As used herein, the term “or” may be construed in either an inclusive orexclusive sense. Moreover, plural instances may be provided forresources, operations, or structures described herein as a singleinstance. Additionally, boundaries between various resources,operations, modules, engines, and data stores are somewhat arbitrary,and particular operations are illustrated in a context of specificillustrative configurations. Other allocations of functionality areenvisioned and may fall within a scope of various embodiments of thepresent disclosure. In general, structures and functionality presentedas separate resources in the example configurations may be implementedas a combined structure or resource. Similarly, structures andfunctionality presented as a single resource may be implemented asseparate resources. These and other variations, modifications,additions, and improvements fall within a scope of embodiments of thepresent disclosure as represented by the appended claims. Thespecification and drawings are, accordingly, to be regarded in anillustrative rather than a restrictive sense.

The foregoing description, for the purpose of explanation, has beendescribed with reference to specific example embodiments. Theillustrative discussions above are not intended to be exhaustive or tolimit the possible example embodiments to the precise forms disclosed.Many modifications and variations are possible in view of the aboveteachings. The example embodiments were chosen and described in order tobest explain the principles involved and their practical applications,to thereby enable others to best utilize the various example embodimentswith various modifications as are suited to the particular usecontemplated.

Note that, although the terms “first,” “second,” and so forth may beused herein to describe various elements, these elements are not to belimited by these terms. These terms are only used to distinguish oneelement from another. For example, a first contact could be termed asecond contact, and, similarly, a second contact could be termed a firstcontact, without departing from the scope of the present exampleembodiments. The first contact and the second contact are both contacts,but they are not the same contact.

The terminology used in the description of the example embodimentsherein is for describing particular example embodiments only and is notintended to be limiting. As used in the description of the exampleembodiments and the appended claims, the singular forms “a,” “an,” and“the” are intended to include the plural forms as well, unless thecontext clearly indicates otherwise. Also note that the term “and/or” asused herein refers to and encompasses any and/or all possiblecombinations of one or more of the associated listed items. Furthermore,the terms “comprises” and/or “comprising,” when used in thisspecification, specify the presence of stated features, integers,blocks, steps, operations, elements, and/or components, but do notpreclude the presence or addition of one or more other features,integers, blocks, steps, operations, elements, components, and/or groupsthereof.

As used herein, the term “if” may be construed to mean “when” or “upon”or “in response to determining” or “in response to detecting,” dependingon the context. Similarly, the phrase “if it is determined” or “if [astated condition or event] is detected” may be construed to mean “upondetermining” or “in response to determining” or “upon detecting [thestated condition or event]” or “in response to detecting [the statedcondition or event],” depending on the context.

As will be understood by those skilled in the art, the invention may beembodied in other specific forms without departing from the spirit oressential characteristics thereof. Likewise, the particular naming anddivision of the portions, modules, servers, managers, components,functions, procedures, actions, layers, features, attributes,methodologies, data structures and other aspects are not mandatory orsignificant, and the mechanisms that implement the invention or itsfeatures may have different names, divisions and/or formats. Theforegoing description, for the purpose of explanation, has beendescribed with reference to specific embodiments. However, theillustrative discussions above are not intended to be exhaustive orlimiting to the precise forms disclosed. Many modifications andvariations are possible in view of the above teachings. The embodimentswere chosen and described in order to best explain relevant principlesand their practical applications, to thereby enable others skilled inthe art to best utilize various embodiments with or without variousmodifications as may be suited to the particular use contemplated.

What is claimed is:
 1. A computer-implemented method, comprising:encoding a data object into a plurality of encoded data fragments;transmitting each one of the plurality of encoded data fragments to abackend of a distributed encoded data storage system, for distributionacross a plurality of backend storage elements, wherein the distributionof the plurality of encoded data fragments across the plurality ofbackend storage elements provides a specific level of redundancy; foreach one of the plurality of encoded data fragments, storing acorresponding fragment identifier and associated metadata in a singlemetadata unit; encoding the single metadata unit into a plurality ofencoded metadata fragments; transmitting each one of the plurality ofencoded metadata fragments to the backend of the distributed encodeddata storage system for distribution across the plurality of backendstorage elements, wherein the distribution of the plurality of encodedmetadata fragments across the plurality of backend storage elementsprovides the specific level of redundancy; associating fragmentidentifiers corresponding to the encoded metadata fragments with thedata object; and storing the fragment identifiers corresponding to theencoded metadata fragments in a frontend storage element, wherein thefrontend storage element provides faster access to stored content thanbackend storage elements.
 2. The computer-implemented method of claim 1,wherein providing a specific level of redundancy further comprises:providing a predetermined level of storage redundancy associated withthe distributed encoded data storage system.
 3. The computer-implementedmethod of claim 1, wherein: the backend storage elements furthercomprise electromechanical storage devices.
 4. The computer-implementedmethod of claim 1, wherein: the frontend storage element furthercomprises a solid state storage device.
 5. The computer-implementedmethod of claim 1, wherein: the frontend storage element furthercomprises flash memory.
 6. The computer-implemented method of claim 1,wherein: the frontend storage element further comprises dynamic randomaccess memory.
 7. The computer-implemented method of claim 1, wherein:each encoded metadata fragment is of a same fragment format as theencoded data fragments.
 8. The computer-implemented method of claim 1,wherein storing the corresponding fragment identifier and associatedmetadata, for each one of the plurality of encoded data fragments, in asingle metadata unit further comprises: storing fragment identifiers andassociated metadata in a first section of the metadata unit; and storinga first portion of contents of the data object in a second section ofthe metadata unit.
 9. The computer-implemented method of claim 1,wherein: metadata associated with each specific one of the plurality ofencoded data fragments further comprises a storage location of thespecific encoded data fragment and encoding information concerning thespecific encoded data fragment.
 10. The computer-implemented method ofclaim 1, further comprising: duplicating the fragment identifierscorresponding to the encoded metadata fragments across a plurality offrontend storage elements.
 11. The computer-implemented method of claim1, further comprising: transmitting a copy of content of the frontendstorage element to the backend of the distributed encoded data storagesystem for storage; retrieving the stored copy from the backend,responsive to a loss of content of the frontend storage element; andstoring the retrieved copy on the frontend storage element.
 12. Thecomputer-implemented method of claim 1, further comprising: cachingcontent of at least one backend storage element on the frontend storageelement.
 13. The computer-implemented method of claim 1, whereinencoding a data object into a plurality of encoded data fragmentsfurther comprises: dividing the data object into a plurality of datapieces; and encoding each one of the data pieces into a plurality ofencoded data fragments.
 14. The computer-implemented method of claim 1,further comprising: in response to transmitting each one of theplurality of encoded data fragments, receiving, for each one of theplurality of encoded data fragments, a corresponding fragment identifierfrom the backend of the distributed encoded data storage system; and inresponse to transmitting each one of the plurality of encoded metadatafragments, receiving, for each one of the plurality of encoded metadatafragments, receiving a corresponding fragment identifier from thebackend of the distributed encoded data storage system.
 15. A computersystem comprising: a processor; system memory; a plurality ofelectromechanical backend storage elements; a solid state frontendstorage element; instructions in the system memory programmed to encodea data object into a plurality of encoded data fragments; instructionsin the system memory programmed to transmit each one of the plurality ofencoded data fragments to a backend of a distributed encoded datastorage system, for distribution across the plurality ofelectromechanical backend storage elements, wherein the distribution ofthe plurality of encoded data fragments across the plurality ofelectromechanical backend storage elements provides a specific level ofredundancy; instructions in the system memory programmed to store, foreach one of the plurality of encoded data fragments, a correspondingfragment identifier and associated metadata in a single metadata unit;instructions in the system memory programmed to encode the singlemetadata unit into a plurality of encoded metadata fragments;instructions in the system memory programmed to transmit each one of theplurality of encoded metadata fragments to the backend of thedistributed encoded data storage system for distribution across theplurality of electromechanical backend storage elements, wherein thedistribution of the plurality of encoded metadata fragments across theplurality of electromechanical backend storage elements provides thespecific level of redundancy; instructions in the system memoryprogrammed to associate fragment identifiers corresponding to theencoded metadata fragments with the data object; and instructions in thesystem memory programmed to store the fragment identifiers correspondingto the encoded metadata fragments in the solid state frontend storageelement, wherein the solid state frontend storage element providesfaster access to stored content than the electromechanical backendstorage elements.
 16. The computer system of claim 15, wherein providinga specific level of redundancy further comprises: providing apredetermined level of storage redundancy associated with thedistributed encoded data storage system.
 17. The computer system ofclaim 15, wherein storing the corresponding fragment identifier andassociated metadata, for each one of the plurality of data fragments, ina single metadata unit further comprises: storing fragment identifiersand associated metadata in a first section of the metadata unit; andstoring a first portion section of contents of the data object in asecond section of the metadata unit.
 18. The computer system of claim15, wherein: metadata associated with each specific one of the pluralityof encoded data fragments further comprises a storage location of thespecific encoded data fragment and encoding information concerning thespecific encoded data fragment.
 19. The computer system of claim 15,further comprising: instructions in the system memory programmed toduplicate the fragment identifier corresponding to the encoded metadatafragments across a plurality of solid state frontend storage elements.20. A computer system comprising: means for encoding a data object intoa plurality of encoded data fragments; means for transmitting each oneof the plurality of encoded data fragments to a backend of a distributedencoded data storage system, for distribution across a plurality ofbackend storage elements, wherein the distribution of the plurality ofencoded data fragments across the plurality of backend storage elementsprovides a specific level of redundancy; means for storing, for each oneof the plurality of encoded data fragments, a corresponding fragmentidentifier and associated metadata in a single metadata unit; means forencoding the single metadata unit into a plurality of encoded metadatafragments; means for transmitting each one of the plurality of encodedmetadata fragments to the backend of the distributed encoded datastorage system for distribution across the plurality of backend storageelements, wherein the distribution of the plurality of encoded metadatafragments across the plurality of backend storage elements provides thespecific level of redundancy; means for associating fragment identifierscorresponding to the encoded metadata fragments with the data object;and means for storing the fragment identifiers corresponding to encodedmetadata fragments in a frontend storage element, wherein the frontendstorage element provides faster access to stored content than backendstorage elements.